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(54) OPERATION RECOGNIZING DEVICE 

(57)Abstract: 

PURPOSE: To improve a success rate of recognition (to 
reduce a failure) by constituting an operation recognizing 
system by a dynamic image by learning from an instance. 

CONSTITUTION: A code book for quantizing a vector is 
prepared at every category, and a conversion to a 
symbol is executed at every category. That is, a feature 
vector train of a feature storage memory 24 is 
converted to a symbol train by a quantizing part 25 by 
referring to a vector quantization code book storage 
memory 31 corresponding to its category. An output of 
the quantizing part 25 is stored in a symbol storage 
memory 26, and it is also different at every used code 
book. In such a way, a dynamic image is converted to a 
symbol train, and at the time of its conversion, likelihood 
is calculated by giving weight corresponding to a 
distance to the most adjacent representive point in the 
code book, therefore, by leaning from an instance, an 
operation of a dynamic body such as a person, etc., in a 
dynamic image can be recognized stably. 
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LThis document has been translated by computer. So the translation may not reflect the original 
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2.**** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] A means to extract characteristic quantity from the input image which photoed the 
actuation classified into various categories, and to express by the vector, A vector quantization 
means to change this vector into a symbol train with reference to the code book created 
beforehand, A means to gain the time series model of operation as a probable state-transition 
model corresponding to a category by training by study data by using this symbol train as input 
data, When the image for recognition is inputted, it has a means to calculate respectively the 
probability for said each model to generate the actuation for recognition. In the recognition 
equipment of operation out of which the actuation corresponding to a model with the highest 
likelihood is selected as a recognition result output in the probability for each model to occur 
The means which prepares the code book of said vector quantization for every recognition 
category, Recognition equipment of operation characterized [ main ] by having a means to 
change a dynamic image into a symbol train using them, and a means to calculate likelihood by 
giving the weight according to the distance to the recently side representation point in a code 
book in the case of the conversion. 

[Claim 2] It is recognition equipment of operation characterized by being a means by which said 
feature-extraction means extracts a feature vector in said recognition equipment [ according to 
claim 1 ] of operation using the mesh description of an image, and an optical flow. 



[Translation done.] 



* NOTICES * 



JPO and NCIPI are not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2. **** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Industrial Application] This invention relates to the recognition equipment of operation which 
recognizes the pattern of actuation of animal objects, such as human being from a dynamic 
image. 
[0002] 

[Description of the Prior Art] Although, as for the pattern recognition technique for a dynamic 
image, research of recent years many was done, the artificer was Japanese Patent Application 
No. 3-205033, in recognition of human being of operation, changed into the symbol train the 
characteristic quantity vector train acquired from the time series image, and devised the 
equipment out of which what has the highest likelihood is selected based on the hidden Markov 
model. Although recognition of the person actuation in a dynamic image was attained by this, 
recognition success percentage is about 90%. 
[0003] 

[Problem(s) to be Solved by the Invention] However, by said conventional technique, although 
the feature vector obtained from a dynamic-image train is distributed over the location where it 
differs in feature-vector space for every category, distribution may lap between different 
categories or it may be close. When a bias is in the data for study, the lap of distribution 
between categories can become large. Thus, when there is a lap of distribution of a feature 
vector between different categories, assigning the feature vector- which belongs to a certain 
category in the case of the vector quantization in the created code book to the code 
corresponding to a different category occurs. When data independent of study data were 
inputted into a recognition system at this time, it had become the big cause of incorrect 
recognition that it is assigned to the code of the category to which that data belongs, and 
another similar category, therefore the likelihood of HMM falls. 

[0004] Especially, the feature vector changed into the symbol which is not contained in the 
study pattern of the category concerned in the symbol conversion at the time of recognition 
occurred, and there was a problem that the likelihood of HMM corresponding to a just category 
might be lowered. This occupied many of causes of an incorrect recognition example. 
[0005] Made in order that this invention may solve said trouble, the purpose of this invention is 
to offer the technique which can be improved in the success percentage of recognition 
(reduction of failure). 

[0006] Other purposes and new descriptions are clarified by description and the accompanying 

drawing of this specification at said row of this invention. 

[0007] 

[Means for Solving the Problem] A means for this invention to extract characteristic quantity 
from the input image which photoed the actuation classified into various categories, and to 
express by the vector in order to attain said purpose, A vector quantization means to change 
this vector into a symbol train with reference to the code book created beforehand, A means to 
gain the time series model of operation as a probable state-transition model corresponding to a 
category by training by study data by using this symbol train as input data, When the image for 



recognition is inputted, it has a means to calculate respectively the probability for said each 
model to generate the actuation for recognition. In the recognition equipment of operation out of 
which the actuation corresponding to a model with the highest likelihood is selected as a 
recognition result output in the probability for each model to occur The means which prepares 
the code book of said vector quantization for every recognition category, It is characterized 
[ main ] by having a means to change a dynamic image into a symbol train using them, and a 
means to calculate likelihood by giving the weight according to the distance to the recently side 
representation point in a code book in the case of the conversion. 

[0008] Said feature-extraction means is characterized by being a means to extract a feature 

vector using the mesh description of an image, and an optical flow. 

[0009] 

[Function] Since likelihood is calculated by preparing the code book of vector quantization for 
every recognition category, changing a dynamic image into a symbol train using them, and giving 
the weight according to the distance to the recently side representation point in a code book in 
the case of the conversion according to the above-mentioned means, by study from an example, 
it is stabilized and actuation of animal objects, such as a person in a dynamic image, can be 
recognized. Thereby, the success percentage of recognition can be improved (reduction of 
failure). 

[0010] And it differs in that this invention constitutes the code book for vector quantization for 

every category with a Prior art. 

[0011] 

[Example] Hereafter, the example of this invention is explained to a detail with reference to a 
drawing. 

[0012] The block block diagram in which drawin g 1 shows the outline configuration of one 
example of the recognition equipment of this invention of operation, and drawing 2 are the block 
block diagrams showing the functional configuration of this example. 

[0013] As for a picture input device and 12, in drawing 1 ,11 is [ a computer and 13 ] external 
memory equipment, drawing 2 — setting — 21 — the image input section and 22 — the memory 
for images, and 23 — the feature-extraction section and 24 — the description storing memory 
and 25 — for the likelihood calculation section and 28, as for the model parameter estimation 
section and 30, the memory for recognition results and 29 are [ the quantization section and 
26 / symbol storing memory and 27 / the state-transition model storing memory for recognition 
and 31 ] vector quantization code book storing memory. 

[0014] There are two phases, study and recognition, in fundamental actuation of this example. At 
the time of study, parameter estimation of the state-transition model for recognition is 
performed from the data for study, and it stores in the state-transition model storing memory 30 
for recognition for every recognition category. At the time of recognition, the likelihood of the 
model corresponding to each category stored in the state-transition model storing memory 30 
for recognition by study is computed, and maximum likelihood estimation which makes a 
recognition result the category corresponding to a model with the maximum likelihood is 
performed. The processing to quantization is the same as that also of the time of recognition at 
the time of study. Moreover, except the part concerning quantization, it is fundamentally [ as 
Japanese Patent Application No. 3-205033 ] the same. In accordance with the flow of 
processing, it explains below. 

[0015] First, the dynamic image which contains working human being from the image input 
sections 21, such as a TV camera, is caught, and it stores in the memory 22 for images. 
[0016] Next, two or more characteristic quantity is obtained from a dynamic image by the 
feature-extraction section 23. Characteristic quantity may be extracted from each frame of a 
dynamic image, and an image sequence may obtain one characteristic quantity, such as two or 
more frame lost-motion information respectively corresponding to the sequence of 
characteristic quantity which case [ information etc. ] or continues. 

[0017] Here, the example of the characteristic quantity to be used is shown below. As an 
example which obtains one feature vector from one frame, the mesh description shown in 



drawin g 3 can be considered. That is, the memory 22 for images is first divided into the subblock 
of NxM with the number of pixels of nxm, and binary-ization of an image is respectively 
performed by this subblock. Next, it is the approach of asking for the pulse duty factor of the 
black pixel in this subblock, and making this the feature vector of a NxM dimension. 
[0018] That is, it is the approach of making aij the pulse duty factor of the black pixel of a mesh 
(i, j), and making a feature vector the vector which put this in order, and fm= (aOO, a01, — , aij, — 
, aMN). 

[0019] After a feature vector is obtained, conversion in the symbol train of a vector train is 
performed by the quantization section 25, and it is recorded on the memory 26 for symbol 
storing. This is based on vector quantization. That is, each feature vector is changed into the 
symbol to which a representation point vector with the nearest distance corresponds based on 
the list of the representation points for the quantization prepared beforehand. This 
representation point group is called a code book. By the method of creating a code book The k- 
mean method (reference 1:X.D.Huang, Y.Ariki and M.A.Jack: "Hidden Markov Models for Speech 
Recognition", Edingurgh Univ.PressO 990).), The LBG method 0 [reference 2:Y.Linde, ] 
[ A.Buzo ] and R.M.Gray: "An Algorithm for Vector Quantization", IEEE There are 
Trans.Commin., 28, PP.84-850 980)., etc. In the case of this example, it is necessary to also 
perform creation of a code book at the time of study of the model for recognition but, and any 
approach is applicable. Moreover, there are Euclidean distance, Mahalanobis distance in 
consideration of distribution of each dimension, etc. in the interval scale to be used. 
[0020] Then, in this example, like drawing 2 , the code book for vector quantization is prepared 
for every category, and the trouble aforementioned by performing conversion as a symbol for 
every category is solved. That is, the feature-vector train of the description storing memory 24 
is changed into a symbol train by the quantization section 25 with reference to the vector 
quantization code book storing memory 31 corresponding to the category. Although the output of 
the quantization section 25 is stored in the symbol storing memory 26, it differs for every code 
book which also used this. 

[0021] Since the input which belongs to the category in order to use the code book for every 
category is surely assigned to the code of the category, it can abolish the incorrect recognition 
by quota failure of a code. 

[0022] However, about the input of a different category, although the code of that category is 
assigned, since it is thought that the distance to the representation vector of that code is large, 
the input belonging to which category can also press down that the likelihood of the data of 
other categories becomes high by imposing the penalty according to this distance. The detail 
about this penalty is mentioned later. 

[0023] An image sequence is changed into a symbol train by the processing so far. About the 
actuation so far, the time of study is the same at the time of recognition. About processing after 
this flowing, the time of recognition is described first. 

[0024] At the time of recognition, these feature-vector trains are recorded on the description 
storing memory 24. And the probability for this feature-vector train to be generated is computed 
by the likelihood calculation section 27 from each of the model stored in the state-transition 
model storing memory 30 for recognition for which only the number of categories to recognize 
was prepared. For the following explanation, the parameter of a model is defined as follows. 
[0025] T: The die length of the observed symbol sequence O (= 01, 02, — , OT), N: The number 
of conditions in a model, the number of symbols in Lmodel, S= (s) : The set of a condition, The 
condition (it cannot observe) that st is the t~th, upsilon= {upsilonl, upsilon2, — , upsilonL} : The 
set of the symbol which can be observed, A= {aij|aij=Pr (st+1=j|st=i)J : State transition probability, 
The probability for aij to change from Condition i to Condition j, B= {bj(Ot) |bj(Ot) =Pr (Ot|st=j)} : 
A symbol output probability, The probability for bj (k) to output symbol upsilonk in Condition j, pi= 
{pj i|pi i=Pr (s1=i)} : the probability for a certain model to generate an initial-state probability and 
a certain observed symbol train With a foward algorithm (reference 1 reference), it is the 
following, and can make and ask. 

[0026] The probability Pr (0|lambda) for a certain model lambda= {A, B, pi} to output the symbol 



> 



sequence O (= 01, 02, — , OT) is [0027]. 
[Equation 1] 



Pr(0|A) = £ a r (i) 



(1) 



[0028] However, it is here and alphaT (i) is [0029]. 
[Equation 2] 

alphat(i) **Pr (01, 02, — , Ot, st=i|lambda) (2) 

It comes out and defines and, specifically, is [0030]. 

[Equation 3] 



[0031] 
[Equation 4] 
alpha1=piibi (01) (4) 
It asks by ********. 

[0032] However, as a result of preparing a code book for every category in this example as 
aforementioned, the likelihood of an inaccurate answer category will be esteemed at the time of 
recognition. 

[0033] This is solved multiplying Pr (Ojlambda) by a certain penalty function Pn according to the 
distance of fl (f) from the codeword mj of VQ. 
[0034] 
[Equation 5] 

Pr'(0|lambdai) =Pr(0|lambdai) pikPn (fk) (5) 
At this example, it is [0035]. 
[Equation 6] 



[0036] It carried out. As a result of LBG, since it can be considered that sigma is almost the 
same, it can be made common to each class and can use the whole average. When much study 
data are obtained to the dimension of a feature vector, it can be further used for every 
dimension the whole sigma class, being able to ask. 

[0037] In this way, the model with which the called-for likelihood serves as max is chosen as a 
recognition result, and is stored in the memory 28 for recognition results. That is, in drawing 2 , 
likelihood is calculated for the symbol train changed with the code book for the categories 
concerned in the likelihood calculation section 27 with the parameter of the state-transition 
model storing memory 30 for recognition of this category, and this result is outputted. The above 
is the flow (flow) of the processing at the time of recognition. 

[0038] Next, it is ** BE ** about the processing in the case of study flowing. The model 
parameter estimation section 29 presumes the parameter of a state-transition model which 
generates the symbol train to the symbol train acquired from the data for study given for every 
category, and stores it in the state-transition model storing memory 30 for recognition. [ two or 
more ] When a certain symbol train O (= 01, 02, — , OT) is given, a Baum-Welch algorithm is 
used for this and it is called for (refer to bibliography 1 about a Baum-Welch algorithm). The 
model for every category for which carried out such and it asked in the model parameter 
estimation section 29 is stored in the state-transition model storing memory 30 for recognition, 
and it is used in the case of recognition. 

[0039] The example of an experimental result of the flow of the processing stated by this 
example is explained. In this example, actuation of tennis was used as actuation for recognition. 
[0040] In this experiment, a back volley, a backstroke, a forehand volley, a forehand stroke, 
service, and the example of a smash of operation were used. About four test subjects, ten trial 
was performed for each [ these ] category of every, and the data for study and 5 times were 



tj) = {^a t -i(i)av}bi{O t ) 



(3) 



Pn(f) = exp(i^) 



C6J 



used for 5 times as data for a recognition experiment inside. The mesh description was used as a 
feature vector. Moreover, the LBG algorithm (reference 2 reference) was used in quantization. 
[0041] What has the maximum likelihood in six state-transition models generated by study in six 
respectively with the application of the data for a recognition experiment was chosen as a 
recognition result. By changing the combination of a study pattern, ten kinds of recognition 
systems were constituted and this recognition rate was evaluated. 

[0042] The result of an experiment is shown in Table 1. Compared with a conventional method, it 
can check that the percentage of correct answers has improved. 
[0043] 
Table 1] 







A 


B 


C 


D 




100.0 


90.6 


97.5 


91.7 




100.0 


100.0 


100.0 


98.3 



[0044] As mentioned above, although this invention was concretely explained based on the 
example, this invention is not limited to said example, and can be variously changed in the range 
which does not deviate from the summary, and things cannot be overemphasized. 
[0045] 

[Effect of the Invention] As mentioned above, since it can be autonomously adapted for an 
object or an environment and advanced recognition is attained compared with the conventional 
technique by constituting the recognition system of operation by the dynamic image by study 
from an example according to this invention as explained, the success percentage of recognition 
can be improved (reduction of failure). 

[0046] Moreover, since model fitting on an image is not included, robust processing is realizable 
also to a real image. Therefore, this invention is widely applicable to logging for a right hand side 
of a request etc. from animations, such as a doubtful actuation monitor in a bank or a store, and 
a sport. 
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DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 1] The block block diagram showing the outline configuration of one example of the 
recognition equipment of this invention of operation, 

[Drawing 2] The block block diagram showing the functional configuration of this example, 
[Drawing 3] The explanatory view of the mesh description used in the feature-extraction section 
of this example. 
[Description of Notations] 

1 1 [ — The image input section 22 / — The memory for images, 23 / — The feature-extraction 
section, 24 / — The description storing memory, 25 / — The quantization section, 26 / — 
Symbol storing memory, 27 / — The likelihood calculation section, 28 / — The memory for 
recognition results, 29 / — The model parameter estimation section, 30 / — The state- 
transition model storing memory for recognition, 31 / — Vector quantization code book storing 
memory. ] — A picture input device, 12 — A computer, 13 — External memory equipment, 21 
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DRAWINGS 



[Drawing 1] 
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^l/^O^V^l/ft, S= is) :Vm<om&s sttt 

nl : mm^Sm^»^(Om^ A= {au I au 
= Pr (s fi = j I s t = i ) } ay 
so ttttflg i ^6tt«j ^»»-r*BI*, B= {bj (Ot) 



(4) 
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I b j (Ot) =Pr (Ot | s t= j ) } : iy^^Vmtl 

fii^ bj (k) j \cis^Ti/y#)\'v^miTt 
UiUi=Pr(si = i)i :mmmm*. 

Pr(0[A) = S ar ^ 

[0 0 2 8] fc/cU CCT-ffT (i) 14, 10 
[0 0 2 9] 

at (i) sPr (Oi, 02 t 

[0 0 3 0] 

i 

[0 0 3 1] 

ai = /ribi (Oi) 

o«fts:T*«)6nSo 20 

[0 0 3 2] fcfcU BufaOfc*3t), *^MtC*5^ 

So 

Pr'(0 I AO = P r (O I A 

[0 0 3 5] 
[§&6] 

= exp(^l) (6) 

30 

[0 0 3 6] ^U/Co L B G<Dt£^otelZ&mVtfr%: 

[0037] c 5 Lt^sn/'ctSftfa^fcastf 
frt>\ mmmmt Lxm^nmmmmm^^v 2 sic 

tH^-T^o W±WE^OJ!iao8f[ti(7n-)-T»fcS 0 

[0038] ^BoKojaaoanjcov^Ta^ 

U3 0tcSxS o cnti. fcSi/VslOI/JUO (=0i, 
0z t — , Ot) tf^TibtltchZlC, Baum-Welchr;l/ 
:nyXA*JB^T3R«>&ft£ (Baum-Welch7;bdtUX so 



[0026] fe§ ; ef;W= {A, B, /r} *^>>#;l/ 
ff&lO (=0i. O2, Ot) ^m^*rS5g^Pr (0 
I A) ti> 
[0 0 2 7] 
[Si] 

(1) 



[»2] 

Ot, st= i I A) (2) 
[S3] 

(3) 



[S4] 

(4) 

[0 0 3 3] VQ©3— F"7— FmjfrS f i<Offi«llCJE 
D/c^^Jb^t^f^iaPn (f) ^Pr (0 I 
A) fc*H±TCtt*«fi*-rSo 

[0 0 3 4] 

GS5] 

) n k Pn (fk) (5) 

[0039] ^ss«^ifi^fcj!rao8fEn<o^si»*»j 

[0 0 4 0] /VyiraRU— . ^7^XhP 

n^fi'jstL 1 om<ouff*ff\<\ fl5i*^s 
?ftjc*v^rtt. LBGT;i/=ruXi* C£«2#!S) * 

[0 0 4 1] *BlcAt3T4«Snft6^©«(l61^ 

[0 0 4 2] m»<Dt£M&m. 1 K^To Saailuifr^ 

[0 0 4 3] 
[*1] 



(5) 
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#*6 




A 


B 


C 


D 




100.0 


90.6 


97.5 


91.7 




100.0 


100.0 


100.0 


98.3 



[0044] ui±, *$&m*mmmizm-3^xmfoff3\c 

[0 0 4 5] 

is. mm& % ©fit «k o xmmmx- &2>mttmm&* 
j ?mucw&vz. wjs&«BOTte mm 

[0 0 4 6] IS±T*Otf^7-C 7r^>^ 



[ia i ] *mmmm>s&mw(D~mffim(»wtmffitii 
[in 2 ] ^ffifiaj^^tits^^-r trwi&m. 

[0 3] *^figWtDlt1tttaigi5-effifflL^7«-y->rL# 
[ft^CDIBBB] 

1 1 ~wmx±mm. i2-3>^-?> 13-^gp 
2 3-w«amBP, 2 25-i? 

ffcSB, 2 6- •i'^l/fiKS^t'J, 2 7-*S»mg|J, 



[01] 













> 


> 



1 1 



1 2 



1 3 



0 
1 



[03] 

3 



0 1 .... 





















it 















































































aij5=number of black mesh(ij)/MmNm 



(6) 
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a 2 



~~[ 



2 1 



I 



^2 3 



2 5 



24 



X 



*^=ry i 



3 



i 



3 1 



#-r=f U 1 



2 6 



29 



I 



2 7 



3 0 



X 



X 



X 



D=r^V 1 



SMf5^BBErt^iBI-TB 1#6^ H 



